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Abstract 

The census survey of undergraduates attending a major 
research university system presents an opportunity to measure 
both disciplinary and institutional differences in students' 
academic experience. Results from nearly 60,000 responses 
(38% response rate) from the 2006 administration found greater 
variance among majors within an institution than between 
equivalent majors across institutions. Cluster analysis techniques 
were employed to establish disciplinary patterns, with 
traditional distinctions between hard and soft sciences generally 
supported. Reporting practices called into question range 
from institutional comparisons that ignore academic program 
mix and discipline to campus performance comparisons that 
do not recognize pedagogical differences by academic major. 
More specifically, these results suggest that calls for comparable 
institutional performance measures, as proposed by the 
Spellings Commission, must take into consideration disciplinary 
differences in instruction. 

Introduction 

There is tremendous appeal in the idea that a series of 
aggregate institutional measures of performance, expressed 
in comparative context, will lead to educational improvement 
or at least will uncover more productive use of public and 
student revenue. It is an attractive notion, but it is very 
likely misleading and counterproductive in application. This 
research will concentrate on one example of publicly reported 
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institutional performance, student survey 
outcomes, but will raise questions that extend 
to related outcomes measures. The performance 
measure in question is institution-level academic 
experience factor scores as measured by limited- 
response questionnaire items asked of current 
students. The specific example is a survey of 
all undergraduates attending any campus of a 
large research extensive state university system 
using the University of California Undergraduate 
Experience Survey (UCUES). 2 The problems noted 
appear to be inherent in similar enterprises. The 
College Student Report of the National Survey 
of Student Engagement (NSSE) for example. It 
will be asserted that sufficient evidence exists 
to justify rejection of these measures as valid 
performance indicators and reconsideration of 
the institutional comparison effort prescribed by 
the Spellings Commission until such time as more 
valid measures are developed or data collection 
methodology is changed. 

The problem derives from a commonly 
accepted but largely unsubstantiated 
premise — that the undergraduate experience 
at most campuses shares sufficient common 
characteristics to be fairly and accurately 
measured by single aggregate scores. More 
explicitly, the problem results from a belief 
that there should be sufficient components 
in common that single scores could be valid 
measures and could be used to assess relative 
performance. An interesting dialogue in Inside 
Higher Ed between Banta on one hand and 
Klein, Shavelson, and Benjamin on the other is 
illustrative. This research will support Banta's 
position. In the exchange, Banta wrote first 
and issued a warning about the Spellings 
Commission's call for "the use of standardized 
tests of general intellectual skills to compare the 
effectiveness of colleges and universities" (2007, 
p. 1 ). Banta referred to her and her colleagues' 
considerable record in assessment and noted 



many well-established problems with sample- 
based institutional scores on standardized 
instruments. Banta proposed as a more viable 
alternative electronic portfolios and measures 
based in academic disciplines. It is Banta's 
recognition of variance by academic disciplines 
that is supported by this research. 

Responding to Banta (2007), Klein, Shavelson, 
and Benjamin (2007), who identify themselves 
as being affiliated with the Collegiate Learning 
Assessment (CLA) program, wrote that the CLA 
measures abilities that"cut across academic 
disciplines and. ..assesses these competencies 
with realistic open-ended measures that present 
students with tasks that all college graduates 
should be able to perform" (p. 2). They go on 
to assert the public interest in performance 
data to determine whether "the students at a 
given school are generally making more or less 
progress in developing these abilities than are 
other students" (p. 2) and conclude by stating 
that the CLA is the best currently available source 
of that information. Their argument in support 
of comparative sample-based summary scores 
generally, the CLA specifically, and against 
measuring those skills as taught and learned 
in academic disciplines appears to be two- 
fold: first, that these are "broad competencies 
that are mentioned in college and university 
mission statements" (p. 2), and second, that 
legislators, college administrators, many faculty, 
college-bound students and their parents, the 
general public, and employers want evidence of 
competencies regardless of academic major. 

Whether or not the conventional wisdom/ 
public interest argument made by Klein et al. 
or the experience of Banta and colleagues is 
asserted, a more basic issue may be the lack of 
common course experience by undergraduates. 
There is very little general education in 
common at large public research universities. 

One illustration of the variance in student 
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experiences at a large public research university 
is provided by Chatman (2004), who examined 
general education policy and student behavior 
at one institution. He found more than one 
thousand courses and millions of combinations 
of courses that might satisfy general educational 
requirements, and only four courses were taken 
by a majority of students. Perhaps that should not 
be surprising for a campus with a cafeteria system 
and more than one hundred undergraduate 
academic majors. Given such a large number of 
majors and courses that can be counted toward 
satisfying requirements, the notion of a widely 
shared, common experience would seem to be 
an invalid premise on its face. And yet, it is a 
recurring theme from both inside and outside the 
academy. 

The external call for comparable performance 
measures most recently includes Education 
Department Secretary Spellings' Commission 
on the Future of Higher Education. On page 25 of 
the Test of Leadership: Charting the Future ofU.S. 
Higher Education (2006), under recommended 
changes to accrediting standards, is the following 
(emphasis added): 

Accreditation agencies should make 
performance outcomes, including completion 
rates and student learning, the core of 
their assessment as a priority over inputs 
or processes. A framework that aligns and 
expands existing accreditation standards 
should be established to (i) allow comparisons 
among institutions regarding learning outcomes 
and other performance measures, ...In 
addition, this framework should require that 
the accreditation process be more open and 
accessible by making the findings of final reviews 
easily accessible to the public and increasing 
public and private sector representation in the 
governance of accreditation organizations and 
on review teams. Accreditation, once primarily 
a private relationship between an agency and 
an institution, now has such important public 
policy implications that accreditors must 
continue and speed up their efforts toward 
transparency as this affects public ends. 



These are admirable standards that higher 
education would likely embrace if it were 
confident that it could effectively measure and 
then communicate the complexity of higher 
education. Modern public research universities 
are academically diverse and, by publicly 
supported agreement, serve extremely diverse 
populations. The accountability strategies 
that have been at least partially successful in 
improving elementary and secondary education 
cannot be easily generalized to postsecondary 
study because postsecondary education is more 
complex by at least an order of magnitude. 
Elementary schools offer few course choices, 
secondary schools several more within a few 
program tracks, and postsecondary institutions 
offer a hundred or more academic majors and 
thousands of courses. Is there cause for concern 
that the Spellings Commission would subject 
higher education to reporting that could only 
grossly oversimplify performance? 

On page 23, the Spellings report (The 
Secretary of Education's Commission on the 
Future of U.S. Higher Education, 2006) cites NSSE 
as an example of student learning assessment, 
stating the following (emphasis added): 

Administered by the Indiana University Center 
for Postsecondary Research, the National 
Survey of Student Engagement (NSSE) 
and its community college counterpart, 
the Community College Survey of Student 
Engagement (CCSSE), survey hundreds 
of institutions annually about student 
participation and engagement in programs 
designed to improve their learning and 
development. The measures of student 
engagement — the time and effort students 
put into educational activities in and out of 
the classroom, from meeting with professors 
to reading books that weren't assigned in 
class — serve as a proxy for the value and quality 
of their educational experience. NSSE and 
CCSSE provide colleges and universities with 
readily usable data to improve that experience 
and create benchmarks against which similar 
institutions can compare themselves. 
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NSSE is one of three examples offered, but 
attention is focused on the NSSE example here 
because it shares similarities with the source of 
data for this study, UCUES. 

Whether striving to accurately assess a 
performance construct or to assess relative 
institutional performance by comparison, too 
little consideration is given by the Spellings 
Commission, and others who would hold 
higher education accountable, to the question 
of whether institution-level statistics are valid 
measures for the proposed purposes. At least 
in this area of assessment, recent survey- 
based evidence provided by NSSE researchers 
Nelson Laird, Shoup, and Kuh (2005); Nelson 
Laird, Schwarz, Kuh, and Shoup (2006);Pike, 

Kuh, Gonyea, and Stratton (2002); and UCUES 
researchers Brint, Cantwell, and Hanneman (2008) 
and Chatman (2007) indicates that institution- 
level measures of student academic experience 
may be too crude to reflect real differences in 
performance, especially for large institutions 
offering a wide range of majors and courses, 
because they do not account for disciplinary 
differences in students' academic experience. 

The remarkable importance of discipline to 
general skill acquisition is well established but 
consistently undervalued. 

In a pair of Journal of Applied Psychology 
papers in 1973, Biglan (1973a, 1973b) 
offered a three-dimensional solution using 
multidimensional scaling of faculty ratings of 
subject matter area similarities. The resulting 
empiricallybased, atheoretical classification 
system employed methodology similar to 
that used in this paper. Biglan's description of 
disciplines along dimensions of hard/soft, pure/ 
applied, life/nonlife responses have proven to be 
remarkably useful to higher education researchers 
because they have been shown to distinguish 
everything from faculty attitudes and behaviors 
to class size. 

With several colleagues over a number of 
years, John Smart has convincingly demonstrated 
that students' and faculty behaviors and attitudes 
and academic disciplines can be described by 



John Holland types and that students do best in 
compatible disciplinary fields in the same way 
that employees are most successful in compatible 
work environments. Holland types are realistic, 
investigative, artistic, social, enterprising, and 
conventional. Moreover, commonalities among 
types by discipline are reflected in academic 
organization structures. An excellent summary of 
the work is Smart, Feldman and Ethington (2004) 
Academic Disciplines: Holland's Theory and the 
Study of College Students and Faculty. 

There is also recent evidence of disciplinary- 
based differences in general skill acquisition 
from nonsurvey-based or mixed-method studies. 
These include Janet Donald's integrated review 
of research on intellectual development. Learning 
to Think (2002); Beyer, Gillmore, and Fisher's 
(2007) remarkably complete longitudinal study 
of University of Washington undergraduates' 
personal, social, and intellectual growth and 
development over four years, and Arum, Roksa, 
and Velez's (2008) longitudinal study of Collegiate 
Learning Assessment (CLA) involving over 2,300 
students attending 24 institutions. Arum et al. 
concluded, "Our analyses confirm the relevance 
of college major. Students majoring in science 
and math as well as those majoring in social 
sciences and humanities exhibit higher growth 
in cognitive skills, as measured by the CLA, than 
students majoring in business. Students majoring 
in engineering, agriculture and computer science 
also experience more cognitive growth, although 
of smaller magnitude" (p. 1 1). As is consistent 
with the work of the other authors, they noted 
that there were fields more conducive to the 
acquisition of cognitive skills as measured by 
the CLA: critical thinking, analytical reasoning, 
and written communication. Braxton, Olsen, and 
Simmons (1998) and others have labeled those 
academic areas affinity disciplines. Perhaps the 
clearest statement of the impact of disciplines 
on "general education" was made by Beyer et al. 
(2007). "As UW SOUL [University of Washington's 
Study of Undergraduate Learning] findings made 
clear, learning in college is mediated in all areas 
by the disciplinary context in which it occurs. 
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This mediation is not only true for learning in 
the major, but also for the courses identified as 
'general education'" (p. 375). While these are very 
important substantiating sources, the current 
study will be limited to survey-based research. 

Relevant Research 

From NSSE, National Study of Student Engagement 

Two publications reporting reliable 
disciplinary differences from NSSE and 
Faculty Survey of Student Engagement (FSSE) 
administrations are Nelson Laird et al.'s 2005 AIR 
paper on deep learning, "Deep Learning and 
College Outcomes: Do Fields of Study Differ?"and 
Nelson Laird et al.'s 2006 AIR paper, "Disciplinary 
Differences in Faculty Members' Emphasis on 
Deep Approaches to Learning." Deep learning, 
from an information processing perspective, 
refers to student-generated efforts to increase the 
number and organization of associations formed 
between new information and information 
already in memory. Using student responses 
and a deep learning scale derived from 1 3 NSSE 
questionnaire items. Nelson Laird et al. (2005) 
found the following disciplinary differences for 
senior respondents: 

• Students in social sciences, arts and 
humanities, professional programs (e.g., 
architecture, urban planning, nursing), and 
education scored higher on deep learning. 
Business, physical sciences, and engineering 
scored lower on the deep learning scale. 
Biological sciences majors were midrange. 

• Subscale high-order learning favored 
professional and engineering students. 

• Both other subscales, integrative learning 
and reflective learning, were highest for social 
science and arts and humanities students 
and were lowest for physical sciences and 
engineering students. 

These findings were generally supported when 
the same analytical strategy was applied to 
faculty responses on the FSSE: 

• Education, arts and humanities, and social 
science faculty described using pedagogical 



practices that emphasized deep learning more 
often, and engineering and physical sciences 
faculty used the practices less often. 

• Higher-order learning techniques were used 
less frequently in biological sciences and were 
uniformly more frequent in the other fields. 

• Use of pedagogical practices to encourage 
integrative learning was highest in education, 
arts and humanities, and social sciences and 
was lowest in the physical sciences. 

• Reflective learning was more frequently 
used in education, arts and humanities, and 
social science and was less frequently used in 
engineering and physical sciences. 

The most common pattern, where arts and 
humanities and social sciences scored higher 
and science and engineering scored lower, was 
consistent from Nelson Laird et al.'s NSSE (2005) 
and FSSE (2006) studies. Based solely on these 
findings, it would be reasonable to assert that 
social sciences and arts and humanities graduates 
would have experienced a better education than 
science and engineering graduates. Of course, it 
would be a more persuasive argument if social 
science and arts and humanities students were in 
greatest demand at graduation and were able to 
command the highest salaries. 

Nelson Laird et al. cite several publications 
reporting advantages of deep learning processing 
(2005) and conclude based on the analysis 
of observed variance in scores that there is 
room for improvement in every field of study 
and that there are good examples of how to 
improve within each disciplinary area. While 
there were serious limitations with both studies 
(disproportionate participation by discipline 
in the first and faculty self-selection of a single 
course to describe in the second), this research 
will not belabor the argument whether deep 
learning is a valuable and valued construct. 

This study is concerned with the use of this 
institutional measure, or very likely any other 
institutional measure of student academic 
experience, as an indicator of comparative 
institutional performance. Unless it is assumed 
that all academic majors should be taught using 
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the same strategies, then the data provided by 
Nelson Laird et al. (2005, 2006) show that an 
institutional outcome measure of engagement 
would reflect program mix. 

From UCUES 

University of California researchers, Brint, 
Cantwell, and Hanneman at Riverside and 
Chatman at Berkeley, have examined difference 
in student academic experience by major 
using UCUES results. The first study used data 
from the 2006 UCUES administration where 
more than 1 50,000 students across a university 
system were invited to participate in the survey; 
38% responded overall and more than 32% 
responded at each campus. The Brint et al. study 
examined responses by upper-division students 
completing the academic core component that 
is common to the various UCUES forms. (UCUES 
is composed of a common academic core and 
one of four or five randomly assigned modules, 
depending on campus choice.) Using factor 
analysis to operationally define dimensions of 
student academic engagement (n~28,000), Brint 
et al.found two types of student engagement, 
one that they asserted to be more typical of 
humanities and social sciences and the other 
more typical of the sciences. These hypotheses 
were confirmed. "Students in the arts, humanities, 
and social sciences scored higher than students 
in other majors on the HUMSOC [humanities and 
social sciences culture] scale. Humanities students 
also scored much lower on the SCIENG [science 
and engineering] scale, while natural sciences, 
engineering, and business students scored 
much higher" (p 391 ). In addition to expected 
differences by major, they found the following 
results: 

• SAT verbal was a significant predictor of 
humanities culture score, and SAT math score 
was a predictor of sciences culture score. 

• Campus was a minor explanatory factor for 
sciences culture and was not associated with 
humanities culture. 



• GPA was positively associated with the 
humanities culture score but negatively 
associated with study time. 

• Sciences culture score was not related to GPA 
but was associated with study time. 

Brint et al. explained the GPA, study time, and 
scale score associations as reflecting disciplinary 
differences in grading practices. Brint et al., like 
Nelson Laird et al., proposed overcoming the 
observed differences but unlike the Nelson Laird 
et al. studies, Brint et al. recognized that there 
were limitations in each culture. Brint et al. also 
identified about 1 0% of students in both fields as 
very engaged, hard working, and active learners 
who were exemplars. 

Chatman (2007) attempted to replicate 
Nelson Laird et al. (2005) using UCUES census- 
based results for a single campus instead of NSSE 
sample-based results across many campuses. 
Over a five-factor varimax solution, Chatman 
found patterns similar to those reported by 
Nelson Laird et al., essentially higher scores for 
engagement in letters and social sciences, lower 
academic engagement scores for engineering 
and physical sciences, and biological sciences in a 
middle range. Chatman (2007) also described an 
example of earlier UCUES results where students 
in engineering at one campus scored lower 
on a long list of academic items than did the 
other students at the same campus but scored 
essentially the same as engineering students at 
other campuses — an applied example of the fact 
that variance is greater across disciplines than 
across campuses. In this engineering instance, 
intra-institutional comparison would have led 
to a dramatically different summative judgment 
of performance than would inter-institutional 
comparison made using the same academic 
discipline at other campuses. 

Impact of Disciplinary Patterns on Performance 
Scores and Interventions 

Collectively, these NSSE and UCUES results 
suggest that there are real disciplinary differences 
in academic engagement specifically and 
academic experience generally. Given valid 
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and reliable disciplinary patterns, institutional 
summary scores would appear to be poor 
measures for campuses with diverse majors. How 
might program mix impact the validity of deep 
processing as an institutional measure? Here are 
a few questions with answers that can be inferred 
from the extant research to illustrate the point: 

• Question 1: Why would liberal arts institutions 
be expected to score higher than state 
schools? 

o Liberal arts schools have relatively more 
social science and humanities majors, and 
social science and humanities students 
have higher scores. Conversely, liberal arts 
schools often do not have lower-scoring 
engineering and business majors. 

• Question 2: Explain how institutional scores 
can mask program deficiencies or areas of 
strength when comparing two institutions. 
(Give one example of a masked area of 
strength and one example of a masked 
deficiency.) 

o First, if the deficit occurs at campus A in 
a field with higher scores on average, the 
campus mean could be the same as B if 
there were more students at A in that field, 
o Second, if A has an area of strength in a 
field that is expected to score lower, A and 
B could still score the same overall if B had 
fewer students in the same field or more 
students in higher scoring fields. 

• Question 3: Explain why comparing the 
average score for one major to the campus 
average is misleading. 

o Without knowledge of an expected score 
for the major, it is not possible to separate 
disciplinary effects from performance. 
Institutional intervention efforts to improve 
scores at a campus with lower scores would 
necessarily be diffuse if the campus were 
ignorant of relative performance by major. Such 
interventions would probably be unsuccessful 
because most faculty would rightly assume that 
they were not part of the problem. Sample-based 
statistics will not identify these patterns unless 
students are sampled at the level of the major 



and will likely provide erroneous information 
leading to misdirected intervention. It is akin to 
confounding within-group effects with between- 
group effects and thereby conveying little of 
importance (Zwick, Brown, & Sklar, 2004). Given 
the importance of academic program, sample- 
based statistics are of questionable value in a high 
stakes environment. 

Because there are known academic 
engagement differences by major and little 
evidence of common experience among 
students at large institutions, this research asserts 
that institution-level measures of academic 
engagement are of limited use and mask more 
valid measures at the level of academic discipline. 
In fact, institution-level measures might well be 
a better reflection of program mix than campus 
performance. The obvious alternative to sample- 
based study or to a census study conducted 
at a single campus is census-based collection 
across multiple campuses. Until recently, the 
resource expenditure to survey more than 
100,000 students distributed across a state would 
have been prohibitive, but Internet delivery 
and email contact make multi-campus census 
surveys a viable alternative. In addition, a log-in 
process can be used to identify responses for the 
purpose of linking questionnaire data with other 
student records. The resulting merged record is 
an exceptional resource for academic inquiry and 
administrative needs. 

Methodology 

The 2006 UCUES survey (Chatman, 2007), 
which included all undergraduate students 
attending a major public research university 
system (-153,000), attained a 38% response 
rate overall (-58,000 responses). Each student 
received a common core set of items and 
one of five randomly assigned modules: 
academic experience, civic engagement, 
student development, student services, or a 
campus-specific module (optional). Because 
the campuses share many similarities, including 
programs offered and selective admissions, 
these data should provide a unique opportunity 
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to determine the extent to which academic 
experience varied by academic program and, 
if variance is observed, the extent to which 
programs can be combined based on similarity 
of student responses into fewer clusters. The 
process required two clusterings: a reduction 
of survey items into factor scores and a 
clustering of academic majors based on those 
factor scores. The analysis used the work of 
Luan, Zhao, and Hayek (2005) as a model, and 
focused on academic core items as the most 
salient assessment dimension. Institutional 
differences were controlled by restricting study 
to the undergraduate student bodies of eight 
similar institutions of one university system. 
Analysis was further restricted to upper-division 
students with declared majors. These actions 
increase the likelihood of useful results but may 
limit generalization to large public research 
universities. 

Results 

Factor Scores 

The UCUES factor analysis of the upper- 
division academic core was a statistically driven 
"consensus of judgment" process. The bulk of 
the analysis was performed by a seven-person 
team of faculty and institutional research and 
UCUES project representatives during a day- 
long working session where alternatives were 
considered in real time by running the programs 
and examining results collectively. The solution 
was done in two stages. The first stage identified 
principal components and used orthogonal 
solutions. The second stage was performed within 
each principal component set and used oblique 
solutions, as it was understood that items within a 
principal component would be correlated. Again, 
consensus judgment regarding the best solution 
was used. The resulting solutions very closely 
followed empirical results but final placement was 
supplemented by judgment-based movement of 
a handful of items from one subfactor to another. 
The first session was followed by two shorter 



meetings during which factor names were 
attached and minor revisions were made. The 
final result was a solution with seven principal 
components. The factor names and their internal 
consistency (Cronbach's coefficient alpha) were: 

Factor 1: Satisfaction with Educational 
Experience (.92) 

Factor 2: Current Skills Self-Assessment 
(Nonquantitative) (.91) 

Factor 3: Gains in Self-Assessment of Skills 
(Nonquantitative) (.89) 

Factor 4: Development of Scholarship (.89) 
Factor 5: Understanding Other Perspectives 
(.85) 

Factor 6: Research Experiences (.69) 

Factor 7: Quantitative Professions (.64) 

The factor solution process, factor loadings. 
Eigen values, and related psychometric results are 
described in detail elsewhere (Chatman, 2007), 
but a brief description of principal factors will 
be provided here. Satisfaction with Educational 
Experience was composed of 30 survey items 
ranging from global satisfaction with GPA, social 
experience, academic experience, etc., but 
mostly consisted of items regarding the major 
(e.g., advising, access, instruction). Current Skills 
Self-Assessment (Nonquantitative) was 1 3 self- 
ratings of general, research, and personal skills. 
The third factor was the difference between 
skills at entry and, as currently rated, for the skills 
comprising the second factor. Development 
of Scholarship consisted of a series reflecting 
Bloom's taxonomy and includes critical reasoning 
and assessment, curricular foundations for 
reasoning, and elevated academic effort. The fifth 
factor concerned development of an appreciation 
and understanding of the perspectives of others, 
based on interactions with students of different 
race, religion, gender, nationality, economic 
circumstance, or sexual orientation. Research 
Experiences was a groupof six items included 
to reflect the unique opportunities available to 
students at a research university. The seventh 
factor. Quantitative Professions, included 
quantitative skills, collaborative learning 
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experiences, and three items about choice of 
major (remuneration, prestige, and fulfillment). 
One additional scalelet (Pike, 2006) was used, 
AcademicTime (time in class or lab and academic 
preparation). Factor scores were computed as 
the standardized mean of standardized item 
scores. In other words, item responses were first 
standardized, and the mean of those responses 
was computed for each student. These first two 
steps produced the raw factor scores. The raw 
factor scores were then standardized to produce 
a reported score with a mean of 5 and a standard 
deviation of 2 at the direction of the project's 
steering committee. Standardized factor scores 
were to be part of an academic profile report, 
and it was decided that this scale avoided 
confusion with other metrics and expanded the 
effective range from 1 to 9 for individual scores. 
While much smaller differences were statistically 
significant, a difference of 0.4 in reported scores 
would suggest a noticeable difference. For 
groups of these sizes, a difference of about 0.2 in 
reported scores would exceed a 95% confidence 
level. 

Academic Major Clusters 

Student major was assigned to one of 1 9 
disciplinary clusters using local conventions. The 
clusters were similar to the level of aggregation 
achieved using a two-digit CIP code (e.g., 
communications, engineering, social sciences, 
biological sciences, letters, agriculture). Factor 
mean scores by discipline were computed for 
areas with 100 or more responding students. 
Those mean area scores were subjected to cluster 
analysis using an agglomerative hierarchical 
clustering based on centroid distance. There 
appeared to be a natural and reasonable cutoff 
at about 0.7 that produced seven clusters that 
are shown in Figure 1 and with a more complete 
description of the mapping of majors to clusters 
in Table 1 . 

The resulting academic topology creates an 
interesting mix, with many clusters confirming 
conventional wisdom and others raising 



interesting questions. One of the surprises was 
that area, ethnic, cultural and gender studies 
(Area) was quickly distinguished from other 
majors. (When the scores are shown graphically 
in the following section, area, ethnic, cultural, 
and gender studies presents a remarkably strong 
profile from an engagement perspective.) The 
next content areas to separate from the pack 
were engineering, business administration, 
mathematics, and computer science. Physical 
sciences and biological sciences joined social 
sciences, humanities, and an agriculture and 
architecture cluster pair, as the majority cluster. If 
an institution were to create academic divisions 
to reflect this topology, the schools and colleges 
would probably be agriculture; architecture; 
humanities and social sciences; biological and 
physical sciences; area and ethnic studies; 
mathematics and computer science; business 
administration; and engineering. This seven- 
cluster solution was used to illustrate variation in 
scores by factor score. 
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Figure 1. Empirically derived structure of the 
University (centroid hierarchical cluster analysis: 
agglomerative at distance ~0.7). 
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Table 1 

Factor Scores for Principal Components by Disciplinary Clusters 



Principal Component Factors 



Disciplinary Area 


FI 


F2 


F3 


F4 


F5 


F6 


F7 


FTb 


#* 


% 


Agriculture 


5.5 


4.9 


4.9 


5.1 


4.7 


5.1 


5.2 


5.2 


601 


2.5% 


Architecture 


4.8 


5.0 


5.3 


5.1 


5.3 


5.0 


4.9 


5.3 


210 


0.9% 


Agriculture &Architecture 


5.7 


5.7 


5.7 


5.0 


5.0 


5.4 


4.7 


4.9 




17% 


Social Sciences 


5.3 


5.4 


5.2 


5.1 


5.2 


4.8 


4.4 


4.7 


5,214 


21.6% 


Communications 


5.2 


5.5 


5.2 


5.0 


5.1 


4.8 


4.3 


4.6 


542 


2.2% 


Education 


5.0 


5.5 


5.6 


5.1 


5.3 


4.9 


4.9 


4.9 


78 


0.3% 


Public Administration 


5.5 


5.2 


5.4 


5.0 


5.3 


5.0 


4.1 


4.5 


111 


0.5% 


Law 


5.3 


5.4 


5.2 


5.5 


5.2 


4.7 


4.8 


4.6 


175 


0.7% 


Interdisciplinary Studies 


5.2 


5.3 


5.4 


5.1 


5.2 


5.2 


4.6 


4.9 


950 


3.9% 


Foreign Languages 


5.6 


5.2 


4.8 


5.0 


5.2 


4.8 


3.9 


4.8 


622 


2.6% 


Letters 


5.4 


5.5 


4.8 


5.2 


5.0 


4.8 


3.8 


4.7 


1,631 


6.8% 


Psychology 


5.0 


5.1 


5.1 


4.9 


5.0 


5.5 


4.5 


4.7 


2,175 


9.0% 


Fine Arts 


5.1 


5.5 


4.9 


5.0 


5.1 


5.0 


4.2 


5.2 


1,415 


5.9% 


Humanities &Social Science 


5.3 


5.4 


5.7 


5.7 


5.2 


4.8 


4.2 


4.8 




40% 


Biological Sciences 


4.9 


4.7 


4.9 


5.0 


4.9 


5.6 


5.4 


5.2 


2,660 


11.0% 


Physical Sciences 


5.1 


4.6 


4.7 


5.1 


4.7 


5.6 


5.9 


5.3 


1,068 


4.4% 


Biological &Physical Sciences 


4.9 


4.7 


4.9 


5.0 


4.9 


5.6 


5.6 


5.2 




15% 


Area and Ethnic Studies 


5.7 


5.7 


5.9 


5.6 


5.9 


5.4 


4.0 


5.1 


555 


2% 


Mathematics 


4.9 


4.3 


4.4 


4.8 


4.6 


4.5 


5.9 


5.1 


539 


2.2% 


Computer Science 


4.7 


4.7 


4.5 


4.4 


4.2 


4.6 


6.2 


5.3 


634 


2.6% 


Mathematics &Computer Science 


4.8 


4.5 


4.5 


4.6 


4.4 


4.5 


6.1 


5.2 




5% 


Business Administration 


4.8 


5.0 


5.1 


4.6 


5.0 


4.4 


6.6 


4.6 


1,096 


5% 


Engineering 


4.7 


4.6 


4.8 


5.0 


4.7 


5.3 


6.6 


5.3 


3,878 


16% 


Minimum 


5.7 


5.7 


5.9 


5.6 


5.9 


5.6 


6.6 


5.3 






Maximum 


4.7 


4.3 


4.4 


4.4 


4.2 


4.4 


3.8 


4.5 






Range 


1.1 


1.4 


1.5 


1.2 


1.7 


1.2 


2.8 


0.8 







Factor Structure 

FI Factor 1 : Satisfaction with Educational Experience 
F2 Factor 2: Current Skills Self-Assessment (Nonquantitative) 

F3 Factor 3: Gains in Self-Assessment of Skills (Nonquantitative) 
F4 Factor 4: Development of Scholarship 

F5 Factor 5: Understanding Other Perspectives 

F6 Factor 6: Research Experiences 

F7 Factor 7: Quantitative Professions 

FTb Factor Time: Subfactor Tb — Academic Time 



* Minimum number of students used in computing a factor score for this discipline. 
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Factor Scores of Academic Major Clusters 

Scores on the first factor, Satisfaction with 
Educational Experience, were highest in area and 
ethnic studies, agriculture and architecture, and 
humanities and social sciences. Satisfaction was 
lower in mathematics and computer science, 
business administration, and engineering (Figure 
2). With a few position changes, the second factor, 
Current Skills Self-Assessment (Nonquantitative), 
was similarly arranged (Figure 3). Area and ethnic 
studies and humanities and social sciences were 
at the upper end and mathematics and computer 
science and engineering were at the lower end. The 
profile for the third factor, Gains in Self-Assessment 
of Skills (Nonquantitative), was very much like 
that of the second factor but with more variance 
at the extremes (Figure 4). Area and ethnic studies 
was more clearly distanced at the upper end and 
mathematics and computer science was more 
clearly distanced at the lower end. The fourth factor, 
Development of Scholarship, found four areas close 
to the overall mean: humanities and social sciences, 
biological and physical sciences, agriculture and 
architecture, and engineering (Figure 5). Again, 
distinguished at the upper end was area and ethnic 
studies. The lower end was held by mathematics 
and computer science and business administration. 
The fifth factor, Understanding Other Perspectives, 
was thankfully highest in area and ethnic studies 
and unfortunately, but perhaps as expected, lowest 
in engineering and mathematics and computer 
science (Figure 6). Research Experiences, the 
sixth factor, presented the first major reordering 
with biological and physical sciences, area and 
ethnic studies, and engineering leading the array. 
Mathematics and computer science and business 
administration were at the lower end of the array 
(Figure 7). Quantitative Professions, the seventh 
factor, confirmed expectations with engineering, 
business administration, and mathematics and 
computer science leading and humanities and 
social sciences and area and ethnic studies trailing 
(Figure 8). The AcademicTime Subfactor (treated as 
a principal factor here) placed science, 



engineering and mathematics (SEM) fields highest 
and humanities and social sciences, area and ethnic 
studies, and business administration lowest 
(Figure 9). 

The relative variance explained by discipline 
and campus was determined for the eight factors 
(Table 2). In all cases, disciplinary cluster explained 
more variance in factor score than did campus with 
F scores — about twice as large for most factors, 
much larger for Research Experiences (Factor 6) 
and the AcademicTime Subfactor, and much larger 
still for Quantitative Professions (Factor 7). It was 
also notable that the interaction of discipline and 
campus was much less important than either main 
effect and was of no meaningful consequence. 

The ratio of variance explained by discipline to 
variance explained by campus favored discipline 
in all cases. The ratios by factor from largest 
to smallest were30.1 for Factor 7, Quantitative 
Professions; 8.3 for Factor 6, Research Experiences; 

4.4 for the AcademicTime Subfactor; 2.8 for Factor 
5, Understanding Other Perspectives; 1 .9 for Factor 
4, Development of Scholarship; 1 .8 for Factor 2, 
Current Skills Self-Assessment (Nonquantitative); 

1 .4 for Factor 1, Satisfaction with Educational 
Experience; and least, 1 .2, for Factor 3, Gains in Self- 
Assessment of Skills (Nonquantitative). 

Summary 

Previous research suggested disciplinary 
differences in educational engagement specifically 
and the academic experience generally. This 
project confirmed that differences do exist across 
a large public research university system; that the 
pattern of traditional engagement differences 
tend to favor social sciences, arts, and humanities; 
and that by including items focused on research 
and collaborative learning, factors are found that 
favored students in mathematics, computer science, 
engineering, and business administration fields. The 
most important result is that academic experience 
and student engagement varies by program of 
study in predictable ways. What does this finding 
mean for instruction? 
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Table 2 

Factor Score Differences by Disciplinary Cluster and Campus 



Principal Component 

Factor Class Variable 


Anova Mean 
Square 


F Value 3 


Pr > F 


Corrected 
Total N 


Factor 1: Satisfaction with Educational Experience 








25,465 


Disciplinary Cluster 


73.0 


75.9 


<0.0001 




Campus 


42.2 


43.8 


<0.0001 




Disciplinary Cluster * Campus 


3.9 


4.1 


<0.0001 




Factor 2: Current Skills Self-Assessment (Nonquantitative) 








25,813 


Disciplinary Cluster 


158.0 


175.5 


<0.0001 




Campus 


90.0 


100.0 


<0.0001 




Disciplinary Cluster * Campus 


0.0 


0.0 


1.0000 




Factor 3: Gains in Self-Assessment of Skills (Nonquantitative) 








25,809 


Disciplinary Cluster 


52.9 


55.2 


<0.0001 




Campus 


35.0 


36.6 


<0.0001 




Disciplinary Cluster * Campus 


1.9 


2.0 


0.0002 




Factor 4: Development of Scholarship 








23,905 


Disciplinary Cluster 


30.2 


30.6 


<0.0001 




Campus 


12.8 


12.9 


<0.0001 




Disciplinary Cluster * Campus 


3.3 


3.4 


<0.0001 




Factor 5: Understanding Other Perspectives 








25,780 


Disciplinary Cluster 


71.0 


72.9 


<0.0001 




Campus 


23.4 


24.0 


<0.0001 




Disciplinary Cluster * Campus 


1.5 


1.5 


0.0218 




Factor 6: Research Experiences 








25,838 


Disciplinary Cluster 


113.4 


118.4 


<0.0001 




Campus 


17.2 


18.0 


<0.0001 




Disciplinary Cluster * Campus 


6.1 


6.4 


<0.0001 




Factor 7: Quantitative Professions 








25,832 


Disciplinary Cluster 


1048.3 


1396.7 


<0.0001 




Campus 


30.9 


41.2 


<0.0001 




Disciplinary Cluster * Campus 


6.3 


8.4 


<0.0001 




Factor Time: Subfactor Tb — Academic Time 








25,662 


Disciplinary Cluster 


307.8 


339.2 


<0.0001 




Campus 


61.0 


67.3 


<0.0001 




Disciplinary Cluster * Campus 


1.0 


1.1 


0.3214 





a Degrees of Freedom for the numerator were 6 for Cluster, 7 for Campus, and 42 for the interaction. The df in the denominator 
averaged 25,51 3 with variation coming from missing data. 
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Figure 2. Satisfaction with Educational Experience 
(Factor 1). 



Figure 3. Current Skills Self-Assessment, 
Nonquantitative (Factor 2). 





Figure 4. Gains in Self-Assessment of Skills, Figure 5. Development of Scholarship (Factor 4). 

Nonquantitative (Factor 3). 
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Figure 6. Understanding Other Perspectives (Factor Figure 7. Research Experiences (Factor 6). 
5). 




Figure 8. Quantitative Professions (Factor 7). 



Figured Subfactor Academic Time (Factor Time, 
Part b). 
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When they reached a similar point in their 
papers, Nelson Laird et al. (2005, 2006) and Brint et 
al. (2008) began to suggest ways that instruction 
might be improved in the lower ranking fields 
(Nelson Laird et al.) or that the better aspects 
of various fields might be used for common 
improvement (Brint et al.). These studies suggested 
that educational experience differences between 
disciplines should be reduced. That differences 
should be reduced is not a matter of concern for this 
research, although it seems clear that more research 
is needed to understand why instructional practices 
differ by academic discipline before recommending 
that they be changed. After all, many of the 
programs described here are considered among 
the best in the country. Instead of suggesting 
changes, this research was solely concerned with 
demonstrating that important differences do exist 
by academic discipline and that these differences 
would lead to misleading conclusions when 
comparing one program to a campus average and 
when comparing one campus to another. Actions 
then taken because of erroneous conclusions could 
hardly succeed. Worse, most institutions of higher 
education remain ignorant of these real differences 
because they rely on easily attained statistical 
samples that could not support analysis at the level 
of an academic discipline. 

There is real danger in embracing the Spellings 
Commission recommendation to use widely 
available student engagement assessments to 
compare performance of one institution with 
another. Institution-level scores are simply 
inadequate. Unless the campusesto be compared 
are composed of the same programs in the same 
proportions, then the comparison will necessarily 
be biased by program composition. To illustrate 
this fact, bachelor degrees awarded by Association 
of American Universities (AAU)institutions were 
clustered into this study's seven areas and assigned 
the mean values found in this study. The results 
were then rank ordered. Using the first factor, 
Satisfaction with Educational Experience, at Harvard 
as an example, Harvard would be predicted to score 
very high because it has one of the highest 



proportions of humanities and social sciences 
students and few, if any, students in business 
administration, engineering and mathematics, 
and computer sciences. Georgia Tech would be 
predicted to score low because it has one of the 
highest concentrations of engineering students 
and a very small proportion of humanities and 
social sciences students. In other words, the 62 
AAU institutions can be rank-ordered based solely 
on disciplinary composition and the tendency of 
students in disciplines to respond differently. Here 
are some of the hypothetical results with the range 
being the difference of the highest score: 

Factor 1 : Satisfaction with Educational Experience 
Top Five Brandeis, Yale, Harvard, 
Catholic University, NYU 
Range 0.44 

Factor 2: Current Skills Self-Assessment 
(Nonquantitative) 

Top Five NYU, Brandeis, Yale, Oregon, 

Emory 

Range 0.64 

Factor 3: Gains in Self-Assessment of Skills 
(Nonquantitative) 

Top Five Brandeis, NYU, Yale, Emory, 

Oregon 

Range 0.31 

Factor 4: Development of Scholarship 

Top Five Brandeis, Yale, Princeton, 

Cal-Davis, Harvard 
Range 0.24 

Factor 5: Understanding Other Perspectives 

Top Five Brandeis, Yale, NYU, Emory, 

North Carolina 
Range 0.41 

Factor 6: Research Experiences 

Top Five Cal Tech, Cal-Davis, Princeton, 

Case Western, Duke 
0.48 



Range 
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Factor 7: Quantitative Professions 

Top Five Georgia Tech, MIT, CalTech, 
Purdue, Case Western 
Range 1.77 

Academic Time 

Top Five CalTech, Georgia Tech, MIT, 

Case Western, Purdue 
Range 0.95 

The point of this example is that substantive 
differences in scale scores can occur as a result 
of nothing more than disciplinary composition. 

Even when two campuses are composed of the 
same programs in the same proportions, the 
summary score will most likely not reflect relative 
performance at the level of interest to faculty and 
student, the academic major or discipline. Simple 
measures to respond to public accountability 
desires may be more easily constructed for 
elementary schools and even for secondary schools 
because of curricular similarities, but the curriculum 
and curricular offerings of postsecondary schools 
appear to be too complex to be effectively reduced 
to a few numbers. If public accountability demands 
comparative performance, then the unit of analysis 
for performance should be the academic discipline. 

An obvious limitation of this study results from 
the academic structure used to initially combine 
academic majors into a smaller number of units 
(equivalent to two-digit CIP). The same arguments 
that this research made about the dangers of 
aggregation could extend to combining majors 
within any group. For example, there might be 
important differences between civil and mechanical 
engineering, or the combination of programs within 
agriculture may mask the same type of differences 
seen at the campus level. 

Setting those concerns aside for the moment, 
the relative validity of measures from derived 
disciplinary clusters and from institutional samples 
is important to understanding the student 
experience in higher education and whenever 
survey outcomes are used as accountability 
measures by which institutional performance 
may be compared. Perhaps the most valuable 



contribution of disciplinary-based measures is in 
program review, because program review happens 
at the level of the major, where faculty recognize 
and bear responsibility for the academic experience. 

Once it is recognized that institution-level 
measures are of questionable validity, leading 
to erroneous conclusions and offering little, if 
any, direction for improvement, it is obvious that 
accountability demands more. Imagine reporting 
to Proctor and Gamble (P&G) shareholders that 
consumers of its products were less satisfied 
than those who used Unilever's products. P&G 
produces about 100 brands distributed over 
about 25 categories, not so different from a large 
public research university. Unilever has about 30 
brands, many competing for the same markets. 
Imagine that your research was based on a sample 
of P&G consumers, and you are not able to report 
satisfaction by product line or to express relative 
satisfaction by product line for competing products. 
How would P&G begin to address the problem? 
Which division head would acknowledge that his 
or her brand was partially responsible for the lower 
score and should therefore be the one to improve? 
What reception would your report receive? More 
importantly, what reception should your report 
receive? Universities faced with the Spellings 
Commission's recommendation need to think about 
these types of questions. 
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